Search CORE

107 research outputs found

Text-Independent Speaker Verification Using 3D Convolutional Neural Networks

Author: Dawson Jeremy
Nasrabadi Nasser M.
Torfi Amirsina
Publication venue
Publication date: 06/06/2018
Field of study

In this paper, a novel method using 3D Convolutional Neural Network (3D-CNN) architecture has been proposed for speaker verification in the text-independent setting. One of the main challenges is the creation of the speaker models. Most of the previously-reported approaches create speaker models based on averaging the extracted features from utterances of the speaker, which is known as the d-vector system. In our paper, we propose an adaptive feature learning by utilizing the 3D-CNNs for direct speaker model creation in which, for both development and enrollment phases, an identical number of spoken utterances per speaker is fed to the network for representing the speakers' utterances and creation of the speaker model. This leads to simultaneously capturing the speaker-related information and building a more robust system to cope with within-speaker variation. We demonstrate that the proposed method significantly outperforms the traditional d-vector verification system. Moreover, the proposed system can also be an alternative to the traditional d-vector system which is a one-shot speaker modeling system by utilizing 3D-CNNs.Comment: Accepted to be published in IEEE International Conference on Multimedia and Expo (ICME) 201

arXiv.org e-Print Archive

Crossref

Semi-supervised Multi-sensor Classification via Consensus-based Multi-View Maximum Entropy Discrimination

Author: Hero III Alfred O.
Nasrabadi Nasser M.
Xie Tianpei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/07/2015
Field of study

In this paper, we consider multi-sensor classification when there is a large number of unlabeled samples. The problem is formulated under the multi-view learning framework and a Consensus-based Multi-View Maximum Entropy Discrimination (CMV-MED) algorithm is proposed. By iteratively maximizing the stochastic agreement between multiple classifiers on the unlabeled dataset, the algorithm simultaneously learns multiple high accuracy classifiers. We demonstrate that our proposed method can yield improved performance over previous multi-view learning approaches by comparing performance on three real multi-sensor data sets.Comment: 5 pages, 4 figures, Accepted in 40th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 15

arXiv.org e-Print Archive

Crossref